Distribution-Preserving Statistical Disclosure Limitation1

نویسندگان

  • Simon D. Woodcock
  • Simon Fraser
  • Gary Benedetto
چکیده

One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed, partially synthetic data sets. These are data on actual respondents, but with con…dential data replaced by multiply-imputed synthetic values. A mis-speci…ed imputation model can invalidate inferences based on the partially synthetic data, because the imputation model determines the distribution of synthetic values. We present a practical method to generate synthetic values when the imputer has only limited information about the true data generating process. We combine a simple imputation model (such as regression) with density-based transformations that preserve the distribution of the con…dential data, up to sampling error, on speci…ed subdomains. We demonstrate through simulations and a large scale application that our approach preserves important statistical properties of the con…dential data, including higher moments, with low disclosure risk. Keywords: statistical disclosure limitation, con…dentiality, privacy, multiple imputation, partially synthetic data Note to Editor: Appendicized Figures are included for reference only. They are not intended for publication.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distribution-preserving statistical disclosure limitation

One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed, partially synthetic data sets. These are data on actual respondents, but with con…dential data replaced by multiply-imputed synthetic values. A mis-speci…ed imputation model can invalidate inferences because the distribution of synthetic data is completely determined by the model used to generate th...

متن کامل

Estimation of Anonymous Email Network Characteristics through Statistical Disclosure Attacks

Social network analysis aims to obtain relational data from social systems to identify leaders, roles, and communities in order to model profiles or predict a specific behavior in users' network. Preserving anonymity in social networks is a subject of major concern. Anonymity can be compromised by disclosing senders' or receivers' identity, message content, or sender-receiver relationships. Und...

متن کامل

Statistical Disclosure Control for Data Privacy Preservation

With the phenomenal change in a way data are collected, stored and disseminated among various data analyst there is an urgent need of protecting the privacy of data. As when individual data get disseminated among various users, there is a high risk of revelation of sensitive data related to any individual, which may violate various legal and ethical issues. Statistical Disclosure Control (SDC) ...

متن کامل

Privacy-Preserving Data Mining

Privacy-preserving data mining (PPDM) refers to the area of data mining that seeks to safeguard sensitive information from unsolicited or unsanctioned disclosure. Most traditional data mining techniques analyze and model the data set statistically, in aggregation, while privacy preservation is primarily concerned with protecting against disclosure individual data records. This domain separation...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009